Deep Neural Network for Multiclass Classification Using Keras.
CIFAR-10 Image Classification using Keras¶
With the increasing adoption of Deep Neural Nets for various machine learning tasks, acquaintance with different frameworks and tools for modeling complex machine learning problems is a must.
In this series, we shall focus on building various architectures of DNN using different frameworks (TensorFlow /Keras, PyTorch, etc).
The idea behind these articles is to familiarize with the syntax and good practices for building DNN models. Let's kick off the series with a Convolutional Neural Network model for classifying images using Keras.
About the Dataset -¶
To limit the training times, we shall be working with the Cifar-10 dataset which consists of images across 10 categories.
These categories are birds, airplanes, cars, cats, deer, dogs, frogs, horses, ships, and trucks.
- Each category consists of 6000 images.
- For more details about the dataset, check out the official website.
Let's start by importing the libraries needed for training the model.
import IPython
import os
import glob
from operator import itemgetter
import scipy.io
import pandas as pd
import numpy as np
import keras.backend as K
from keras.layers import (Dense, Conv2D, Activation, Dropout,
Input, MaxPooling2D, Flatten, BatchNormalization, LeakyReLU)
from keras.models import Model
from keras import optimizers
from keras.utils import plot_model
from keras.utils.vis_utils import model_to_dot
from keras import callbacks
from keras import regularizers
from keras.datasets import mnist
from sklearn.utils import shuffle
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML("<style>.container { width:100% !important; }</style>"))
Before we jump into the implementation bits, it would be helpful to chalk out the roadmap for successfully building and training a Deep Neural Network.
1. Loading the data-
- First we shall implement the code for successfully reading the cifar-10 dataset.
2. Visualizing data-
- Visualizing different aspects of our data often helps us to understand the problem statement and underlying dataset better.
- This visualization step can be as detailed as one would like it to be.
- This includes performing Exploratory analysis, Outlier Analysis, Image Visualization, Class distribution, etc.
- Data Visualization and pre-processing steps are often under-rated but the analysis directly influences the choice of certain hyperparameters in your ML model.
- For instance, consider a dataset with severe class imbalance, say a 99:1 ratio. That is, for every 99 positive data points, we have 1 negative data point. Choosing accuracy as our evaluation metric will result in incorrect results. In such cases, a quick visualization of class distribution will help us choose better evaluation metrics. Perhaps, something like recall, F1 score, beta F1-score.
3. Preparing Data for Training, Validation and Prediction -
- Often, loading entire data set in memory is not efficient.
- Different Deep Learning frameworks offer various wrappers for loading datasets using generators.
- To gain better insights into how data batches should be generated, we shall implement a custom data generator.
4. Defining / Building the Model -
- In this section, we shall build the actual model.
- The focus of this section would be to familiarize ourselves with Keras syntax for adding complex layers.
- In this section, we shall also highlight good practices that can help our model to learn faster.
5. Visual the Model -
- Before we train the model, it often helps to visualize the neural network that has been implemented.
- This step also helps in debugging the issues that may have been embedded in our model unknowingly resulting in code errors.
6. Training the model -
- Once a model has been defined and data ready to be flown through it, the next logical step is to train the model.
- We shall train a model using training and validation data.
7. Predicting -
- DNN are known to overfit easily.
- It is often helpful to test the model for out of bag data points. This will help us understand how well the model generalizes on unseen data.
8. Trouble Shooting / FAQs
- In this last section, we shall focus on a few issues that one may encounter while training DNNs.
- I will keep updating this list to ensure that all the possible pitfalls are documented.
Lets kick off implementation by writing some basic functions for loading the dataset
### reading Labels file
def unpickle(file_name):
'''
Unpickles a pickled file.
Args:
file_name : absolute path of the file that needs to be unpickled.
Returns:
dict_val: python dictionary containing data and labels. eg - {'data': np.array, labels: np.array}
'''
import cPickle
with open(file_name, 'rb') as fo:
dict_val = cPickle.load(fo)
return dict_val
## read data
def read_cifar_data(parent_directory_wildcard):
'''
The cifar-10 dataset is available in part files and hence this fucntion will unpickle and concatenate
the content of all the files in a directory
Args:
parent_directory_wildcard: folder_path where all the training data is present
Return:
raw_X: dataframe containing all the data available for training.
raw_labels: dataframe containing corresponding labels for the training data.
'''
raw_X = []
raw_labels = []
for filename in glob.glob(parent_directory_wildcard):
dict_val = unpickle(filename)
raw_X.extend(dict_val['data'])
raw_labels.extend(dict_val['labels'])
return raw_X, raw_labels
Now thet we have our helper functions ready, lets load the entire dataset in pandas dataframe
raw_X, raw_labels = read_cifar_data('./data/cifar10/cifar-10-python/cifar-10-batches-py/data_batch*')
cifar_raw_data = pd.DataFrame(zip(raw_X, raw_labels), columns = ['np_images', 'labels'])
print(cifar_raw_data.head())
Label Binarizer -¶
- It can be seen that for each training image, the labels are an integer.
- Before we train our classifier, we need to transform these categorical labels into one-hot encoded vectors.
- We shall achieve this using Sklearn's Label Binarizer.
- For more details on building intuition behind LabelBinarizer, check out toy examples on sklearn documentation page.
label_binarizer = LabelBinarizer()
label_binarizer.fit(cifar_raw_data['labels'].unique())
Now that we have trained our LabelBinarizer, lets quickly check out the unique classes encoded by the Binarizer. This is just a sanity check and can be skipped.
print("Successfully trained a Binarizer with %d unique classes"%len(label_binarizer.classes_))
Awesome !!
We are now ready to move on to the step 2 of our blueprint - Data Visualization
def get_shuffled_data(grouped_data, n = 5):
shuffled_data = shuffle(grouped_data).head(n)
return shuffled_data
sampled_data = cifar_raw_data.groupby('labels', as_index = False).apply(get_shuffled_data, 3).reset_index(drop = True)
def generate_visualizations(vis_df):
## fix the height and width of the image to be displayed
height, width =20, 5
# decide the number of images to be displayed from each category
columns = 3
# Find the number of unique categories in the training dataset
rows = vis_df['labels'].nunique()
fig, axes = plt.subplots(nrows=rows, ncols=columns, figsize=(width,height), sharex = True, sharey = True);
labels = vis_df['labels'].unique()
for index in range((columns*rows)):
ax = fig.add_subplot(rows,columns,index+1)
ax.axis('off')
ax.imshow(vis_df['np_images'].iloc[index].reshape(3, 32, 32).transpose(1,2,0)/255., interpolation='nearest')
for ax, row in zip(axes[:, 0], labels):
ax.set_ylabel(row)
ax.set_yticks([])
for ax, row in zip(axes[0, :], labels):
ax.set_xticks([])
generate_visualizations(sampled_data)
Some Observations -¶
- It can be seen from above visualizations that the image quality is below par.
- By design, the categories are mutually exclusive however, things can get tricky for datasets with poor quality images and categories that are visually similar.
On that note, let's move on to Step 3 - Preparing data for Training and Validation
- Since there is no class imbalance, let's keep the logic for splitting that dataset into training, test and validation fairly simple.
- We shall piggyback on sklearn's train_test_split function for accomplishing the task.
- It is important to note this particular function doesn't split data 3 ways and hence we shall be making a function call twice to further split the training data for model validation.
For more information on sklearn's train_test_split function checkout their official documentation.
#### split raw_data into train, test and validation sets
complete_train_data, test_data = train_test_split(cifar_raw_data, test_size = 0.05)
train_data, validation_data = train_test_split(complete_train_data, test_size = 0.08)
train_data = train_data.reset_index(drop = True)
validation_data = validation_data.reset_index(drop = True)
test_data = test_data.reset_index(drop = True)
print("Training data is of size %d"%len(train_data))
print("Validation data is of size %d"%len(validation_data))
print("Test data is of size %d"%len(test_data))
Now that we have our training, validation and test data set cut out, we would like our training process to be memory efficient. To implement mini-batch training, we shall leverage the concept of generators in python. To learn more about the difference between a generator and iterator in python, check out this blog post.
For generating mini-batches of our data, we shall implement a generator function involving the following steps -
Args: Pass original_dataframe, trained Label Binarizer model and batch_size as input arguments. We can easily make the function more scalable by loading the Label Binarizer model from the pickle file. However, I would like to keep it as simple as possible for the sake of this article.
Initialize a counter variable.
While True:
--> Batch indices based on batch_size
--> Slice dataset.
--> yield data
def prep_training_generators(train_df, label_binarizer_model, batch_size):
indices = range(len(train_df))
indx_iterator = 0
while True:
if (indx_iterator +1) * batch_size > len(indices):
indx_iterator = 0
np.random.shuffle(indices)
batch_indices = indices[indx_iterator* batch_size: (indx_iterator+1)*batch_size]
batch_x = np.stack(train_df.loc[batch_indices, 'np_images'].values, axis = 0)
reshaped_batch_x = (batch_x.reshape(batch_size, 3, 32, 32).transpose(0, 2, 3, 1))/255.
raw_y = train_df.loc[batch_indices, 'labels'].values
batch_y = label_binarizer_model.transform(raw_y)
indx_iterator += 1
yield reshaped_batch_x, batch_y
We shall combine steps 4 and 5 in the next section -¶
- In this article, we will be building a model using Keras's Functional API.
- I prefer the Functional API since it allows us to define and build more flexible models with Keras.
- For more information regarding the difference and advantages of Functional API over Sequential API, refer to this blog post.
Each layer of our neural network shall consist of following elements -
- Convolution2D layer.
- BatchNormalization.
- Activation Layer.
- MaxPooling2D
- Dropout Layer.
- The output layer shall consist of neurons equivalent to the number of unique classes.
- Note that, since this is a multiclass classification, categorical cross-entropy will be our choice of loss function. For Binary classification, one can use binary_crossentropy.
- Also, note the activation function in the output layer. Since we would like a probability vector at the output layers, we shall be using softmax function.
With this basic building blocks in mind, feel free to experiment with the network architecture.
def keras_image_functional_model(image_shape, n_classes):
"""
More info on losses - https://keras.io/losses/
"""
weight_decay = 1e-4
img_input = Input(shape = image_shape)
img_emd = Conv2D(32, (3, 3), padding = 'same', strides=1, kernel_regularizer=regularizers.l2(weight_decay))(img_input)
img_emd = BatchNormalization()(img_emd)
img_emd = LeakyReLU(alpha = 0.2)(img_emd)
img_emd = Dropout(0.2)(img_emd)
img_emd = Conv2D(32, (3, 3), padding = 'same', kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
img_emd = BatchNormalization()(img_emd)
img_emd = LeakyReLU(alpha = 0.2)(img_emd)
img_emd = MaxPooling2D()(img_emd)
img_emd = Dropout(0.2)(img_emd)
img_emd = Conv2D(64, (3, 3), padding = 'same', kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
img_emd = BatchNormalization()(img_emd)
img_emd = LeakyReLU(alpha = 0.2)(img_emd)
img_emd = MaxPooling2D()(img_emd)
img_emd = Dropout(0.2)(img_emd)
img_emd = Flatten()(img_emd)
img_emd = Dense(512, kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
img_emd = BatchNormalization()(img_emd)
img_emd = Activation('relu')(img_emd)
img_emd = Dense(128, kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
img_emd = BatchNormalization()(img_emd)
img_emd = Activation('relu')(img_emd)
out_logits = Dense(n_classes, activation = 'softmax')(img_emd)
model = Model(inputs = img_input, outputs = out_logits, name = "image_model")
model.summary(line_length=200)
model.compile(optimizer= 'adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
def plot_keras_model(model_name, show_shapes_bool = True):
return IPython.display.SVG(model_to_dot(model_name, show_shapes= show_shapes_bool).create(prog='dot', format='svg'))
BATCH_SIZE = 32
tboard = callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=32, write_graph=True, write_grads=False,
write_images=False, embeddings_freq=0)
train_data_gen = prep_training_generators(train_data, label_binarizer, BATCH_SIZE)
validation_data_gen = prep_training_generators(validation_data, label_binarizer, BATCH_SIZE)
image_only_model = keras_image_functional_model(image_shape = (32, 32, 3) , n_classes=10)
# plot_keras_model(image_only_model, to_file='keras_cnn_cifar_model.png')
plot_model(image_only_model, to_file='keras_cnn_cifar_model.png', show_shapes=True, show_layer_names=True)
Step 6 - Lets train the model without much a do !!¶
a. Pay special attention to the parameters steps_per_epoch and validation_steps.
b. steps_per_epoch signifies the number of batches that should be drawn from the generator before shuffling the data and starting a new iteration of training.
c. There are two different parameters to control the steps per epoch to account for different data sizes of training and validation sets.
history = image_only_model.fit_generator(train_data_gen,
steps_per_epoch=np.ceil(len(train_data) / BATCH_SIZE),
epochs=15,
validation_data = validation_data_gen,
validation_steps = np.ceil(len(validation_data) / BATCH_SIZE),
verbose=1, callbacks=[tboard])
## Prediction
def prep_test_generators(train_df, batch_size):
indices = range(len(train_df))
indx_iterator = 0
while True:
if (indx_iterator +1) * batch_size > len(indices):
indx_iterator = 0
batch_indices = indices[indx_iterator* batch_size: (indx_iterator+1)*batch_size]
batch_x = np.stack(train_df.loc[batch_indices, 'np_images'].values, axis = 0)
reshaped_batch_x = batch_x.reshape(batch_size, 3, 32, 32).transpose(0, 2, 3, 1)/255.
indx_iterator += 1
yield reshaped_batch_x
test_pred_gen = prep_test_generators(test_data, batch_size=BATCH_SIZE)
test_prediction_logits = image_only_model.predict_generator(test_pred_gen,
steps=np.ceil(len(test_data)/ BATCH_SIZE))
predicted_values = np.argmax(test_prediction_logits, axis = -1)
test_accuracy = np.float(np.sum(np.equal(predicted_values, test_data['labels'][:len(predicted_values)].values)))/len(test_data)
print("Test Accuracy is %f"%(test_accuracy*100))
End Comments -¶
- Evidently, DNNs is doing much better than random predictions.
- A comparable accuracy on train and test data reveals that the model is capable of generalizing and is not overfitting.
- Finally, the purpose of this article is to highlight the process of building a DNN using Keras. Feel free to experiment with different architectures and check how accuracy improves.
- Note that, bigger the model, larger will be the training times.
- I will soon try and upload an article in which we shall leverage pre-trained models for faster train times and better accuracy.
Trouble Shooting -¶
Why is my loss is NaN/ Inf?
- Be very careful while choosing activation function, especially ReLu. It is important to note that, ReLu activations are often unbounded and may result in an exploding gradient problem.
- Use batch Normalization to minimize the probability of encountering the problem of exploding gradients.
- Gradient Clipping strategy to counter the issue of Exploding Gradient is also widely adopted.
- If you are using custom loss function, ensure that a 0/0 situation won't arise which may induce nan/inf in computations.
- Check out the strategies suggested for a similar issue on stackoverflow.
- If the problem persists, reach out to the community and seek help.
Why is my accuracy not changing?
- Such a problem can be encountered due to multiple reasons. Start by checking the class distribution. If there is a severe class imbalance, a model may resort to predicting all zeros/ all ones resulting in stagnant accuracy.
- Ensure that the generator used for generating mini-batches is working as expected.
- Instead of building an ambitious and complex model, start by building a minimalistic version and add layers based on performance achieved.